首页> 外文OA文献 >Effects of different missing data imputation techniques on the performance of undiagnosed diabetes risk prediction models in a mixed-ancestry population of South Africa
【2h】

Effects of different missing data imputation techniques on the performance of undiagnosed diabetes risk prediction models in a mixed-ancestry population of South Africa

机译:南非混合血统人群中不同缺失数据插补技术对未诊断的糖尿病风险预测模型性能的影响

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

BACKGROUND: Imputation techniques used to handle missing data are based on the principle of replacement. It is widely advocated that multiple imputation is superior to other imputation methods, however studies have suggested that simple methods for filling missing data can be just as accurate as complex methods. The objective of this study was to implement a number of simple and more complex imputation methods, and assess the effect of these techniques on the performance of undiagnosed diabetes risk prediction models during external validation. METHODS: Data from the Cape Town Bellville-South cohort served as the basis for this study. Imputation methods and models were identified via recent systematic reviews. Models’ discrimination was assessed and compared using C-statistic and non-parametric methods, before and after recalibration through simple intercept adjustment. RESULTS: The study sample consisted of 1256 individuals, of whom 173 were excluded due to previously diagnosed diabetes. Of the final 1083 individuals, 329 (30.4%) had missing data. Family history had the highest proportion of missing data (25%). Imputation of the outcome, undiagnosed diabetes, was highest in stochastic regression imputation (163 individuals). Overall, deletion resulted in the lowest model performances while simple imputation yielded the highest C-statistic for the Cambridge Diabetes Risk model, Kuwaiti Risk model, Omani Diabetes Risk model and Rotterdam Predictive model. Multiple imputation only yielded the highest C-statistic for the Rotterdam Predictive model, which were matched by simpler imputation methods. CONCLUSIONS: Deletion was confirmed as a poor technique for handling missing data. However, despite the emphasized disadvantages of simpler imputation methods, this study showed that implementing these methods results in similar predictive utility for undiagnosed diabetes when compared to multiple imputation.
机译:背景:用于处理缺失数据的插补技术基于替换原理。广泛主张多重插补优于其他插补方法,但是研究表明,用于填充缺失数据的简单方法与复杂方法一样准确。这项研究的目的是实施多种简单和更复杂的估算方法,并在外部验证期间评估这些技术对未诊断的糖尿病风险预测模型的效果。方法:来自开普敦贝尔维尔南部队列的数据作为该研究的基础。通过最近的系统评价确定了估算方法和模型。在通过简单截距调整进行重新校准之前和之后,使用C统计和非参数方法评估并比较了模型的辨别力。结果:该研究样本包括1256人,其中173人因先前诊断为糖尿病而被排除在外。在最终的1083名个人中,有329名(30.4%)缺少数据。家族史中丢失数据的比例最高(25%)。结果的归因(未确诊的糖尿病)在随机回归归因中最高(163人)。总体而言,删除导致最低的模型性能,而简单的估算得出的剑桥糖尿病风险模型,科威特风险模型,阿曼糖尿病风险模型和鹿特丹预测模型的C统计量最高。对于鹿特丹预测模型,多重插补只产生了最高的C统计量,并通过更简单的插补方法进行了匹配。结论:删除被确认为处理丢失数据的一种较差的技术。然而,尽管强调了简化插补方法的弊端,但这项研究表明,与多次插补相比,实施这些方法对未诊断的糖尿病具有相似的预测效用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号